Learning Visually Grounded Words and Syntax of Natural Spoken Language

نویسنده

  • Deb Roy
چکیده

Properties of the physical world have shaped human evolutionary design and given rise to physically grounded mental representations. These grounded representations provide the foundation for higher level cognitive processes including language. Most natural language processing machines to date lack grounding. This paper advocates the creation of physically grounded language learning machines as a path toward scalable systems which can conceptualize and communicate about the world in human-like ways. As steps in this direction, two experimental language acquisition systems are presented. The first system, CELL, is able to learn acoustic word forms and associated shape and color categories from fluent untranscribed speech paired with video camera images. In evaluations, CELL has successfully learned from spontaneous infant-directed speech. A version of CELL has been implemented in a robotic embodiment which can verbally interact with human partners. The second system, DESCRIBER, acquires a visually-grounded model of natural language which it uses to generate spoken descriptions of objects in visual scenes. Input to DESCRIBER’s learning algorithm consists of computer generated scenes paired with natural language descriptions produced by a human teacher. DESCRIBER learns a three-level language model which encodes syntactic and semantic properties of phrases, word classes, and words. The system learns from a simple ‘show-and-tell’ procedure, and once trained, is able to generate semantically appropriate, contextualized, and syntactically well-formed descriptions of objects in novel scenes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning visually grounded words and syntax for a scene description task

A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell" procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using ...

متن کامل

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...

متن کامل

A Trainable Visually-grounded Spoken Language Generation System

A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell’ procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using ...

متن کامل

Visually Grounded Learning of Keyword Prediction from Untranscribed Speech

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider w...

متن کامل

The Relationship between Self-esteem and Conversational Dominance of Iranian EFL Learners’ Speaking

The crucial role of affective factors like anxiety, inhibition, motivation and self-esteem have long been of interest in the field of language learning due to their enormous association with the cognitive processes involved in performance in a second or foreign language. This study aimed at investigating the relationship between Iranian EFL learners’ self-esteem and conversational dominance in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000